Empirical Comparison of Boosting

نویسندگان

Philip Chan

Salvatore Stolfo

چکیده

Methods for voting classiication algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classiiers for artiicial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, aaect classiication error. We provide a bias and variance decomposition of the error to show how diierent methods and variants innuence these two terms. This allowed us to determine that Bagging reduced variance of unstable methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves diierently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental diierence. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backktting of data. We found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backkt. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and show that the voting methods lead to large and signiicant reductions in the mean-squared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underrows. We use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only \hard" areas but also outliers and noise.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Bagging and Boosting

Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, i...

متن کامل

Combining Bagging and Additive Regression

Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of regression models using the same learning algorithm as base-learner. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in t...

متن کامل

Boosted Image Classification: An Empirical Study

The rapid pace of research in the fields of machine learning and image comparison has produced powerful new techniques in both areas. At the same time, research has been sparse on applying the best ideas from both fields to image classification and other forms of pattern recognition. This paper combines boosting with stateof-the-art methods in image comparison to carry out a comparative evaluat...

متن کامل

Empirical Margin Distributions and Bounding the Generalization Error of Combined Classifiers

We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods ...

متن کامل

An Empirical Comparison of Pruning Methods for Ensemble Classifiers

Many researchers have shown that ensemble methods such as Boosting and Bagging improve the accuracy of classification. Boosting and Bagging perform well with unstable learning algorithms such as neural networks or decision trees. Pruning decision tree classifiers is intended to make trees simpler and more comprehensible and avoid over-fitting. However it is known that pruning individual classif...

متن کامل

Boosting Lazy Decision Trees

This paper explores the problem of how to construct lazy decision tree ensembles. We present and empirically evaluate a relevancebased boosting-style algorithm that builds a lazy decision tree ensemble customized for each test instance. From the experimental results, we conclude that our boosting-style algorithm significantly improves the performance of the base learner. An empirical comparison...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Empirical Comparison of Boosting

نویسندگان

چکیده

منابع مشابه

Combining Bagging and Boosting

Combining Bagging and Additive Regression

Boosted Image Classification: An Empirical Study

Empirical Margin Distributions and Bounding the Generalization Error of Combined Classifiers

An Empirical Comparison of Pruning Methods for Ensemble Classifiers

Boosting Lazy Decision Trees

عنوان ژورنال:

اشتراک گذاری